# English Visual Understanding
VL Rethinker 7B 6bit
Apache-2.0
This is a multimodal model based on Qwen2.5-VL-7B-Instruct, supporting visual question answering tasks, converted to MLX format for efficient operation on Apple chips.
Text-to-Image
Transformers English

V
mlx-community
19
0
Brahmai Clip V0.1
MIT
CLIP model based on ViT-L/14 and masked self-attention Transformer for zero-shot image classification research
Text-to-Image
Transformers English

B
brahmairesearch
12.53k
0
Uform Gen
Apache-2.0
UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.
Image-to-Text
Transformers English

U
unum-cloud
152
44
Hashtaggenerater
Flickr30k is an English dataset for image-to-text tasks, commonly used for training and evaluating image caption generation models.
Image-to-Text
Transformers English

H
kusumakar
24
2
Featured Recommended AI Models